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(57) ABSTRACT 


Method and system for analyzing aircraft data, including 
multiple selected flight parameters for a selected phase of a 
selected flight, and for determining when the selected phase 
of the selected flight is atypical, when compared with 
corresponding data for the same phase for other similar 
flights. A flight signature is computed using continuous- 
valued and discrete-valued flight parameters for the selected 
flight parameters and is optionally compared with a statis- 
tical distribution of other observed flight signatures, yielding 
atypicality scores for the same phase for other similar flights. 
A cluster analysis is optionally applied to the flight signa- 
tures to define an optimal collection of clusters. A level of 
atypicality for a selected flight is estimated, based upon an 
index associated with the cluster analysis. 

17 Claims, 6 Drawing Sheets 
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IDENTIFICATION OF ATYPICAF FFIGHT 
PATTERNS 

ORIGIN OF THE INVENTION 

5 

The invention described herein was made by employees 
of the United States Government and its contractors under 
Contract No. NAS2-99091 and may be manufactured and 
used by or for the Government for governmental purposes 
without the payment of any royalties thereon or therefor. to 

TECHNICAL FIELD 

This invention relates to digital flight data processing that 
have been recorded on aircraft during flight operations. is 

BACKGROUND OF THE INVENTION 

On a typical day, as many as 25,000 aircraft flights occur 
within the United States, and several times that number 20 
occur throughout the world. Most of these flights are safe. A 
few might exhibit safety issues. Many aircraft are equipped 
with instrumentation that collects from a few dozen param- 
eters to a few thousand parameters every second for the full 
duration of the flight. These types of data have long been 2 s 
used for crash investigations but can also be used for routine 
monitoring of flight operations. The subject invention relates 
to the latter activity. This provides an opportunity to analyze 
this data to identify portions of flights that exhibit safety 
issues. Aviation experts review these flights and recommend 30 
appropriate actions as a result. 

Flight data, recorded during aircraft flight, consist of a 
series of parameter values. Each parameter describes a 
particular aspect of flight. Some parameters relate to con- 
tinuous data such as altitude and airspeed. Other parameters 35 
assume a relatively small number of discrete values (e.g., 
two or three), such as thrust reverser position or flight 
guidance or autopilot command mode. Parameter measure- 
ments are usually made once per second although they may 
be recorded more or less frequently. Hundreds or even 40 
thousands of parameters may be collected for each second of 
an entire flight. These data are recorded for thousands of 
flights. The resulting data for an even modest size set of 
flights are voluminous. 

Conventional methods of finding anomalous flights in 45 
bodies of digital flight data require users to pre-define the 
operational patterns that constitute unwanted performances. 
This can be a hit-or-miss process, requiring the experience 
and knowledge of experts in aviation operations, and it only 
identifies occurrences that specifically match the pre-defined 50 
condition. A conventional flight data analysis tool will find 
the patterns it is told to look for in flight data, but the tool 
is blind to newly emergent patterns for which the tool has 
not been programmed to look. The invention overcomes this 
deficiency because it does not require any pre-specification 55 
of what to look for in bodies of flight data. 

Naturally most flights are typical and exhibit no safety 
issues. Avery few flights stand out as atypical based values 
displayed by the data. These flights may be atypical due to 
one flight parameter being very unusual or multiple param- 60 
eters being moderately unusual. It turns out that these 
unusual flights often exhibit safety issues and thus are of 
interest to identify and refer to aviation safety experts for 
review. Additionally, these atypical flights might display 
safety issues in a manner never envisioned by safety experts; 65 
hence impossible to find using pre-defined exceedences as 
done by the current state of the practice. 


2 

What is needed is an approach that allows identification of 
the most important flight parameters, capture and character- 
ization of the dynamic values of these important parameters, 
and application of a consistent analysis to identify aircraft 
flights which exhibit atypical characteristics. This could 
mean that one or more of these parameters exhibits atypical 
values with respect to a collection of a set of flights that 
collectively define “typical”. This could also mean that 
individual parameters were marginally atypical, but collec- 
tively atypical. The analysis must be extendable to a larger 
or smaller number of “important” parameters and should not 
depend upon choice of a fixed number of such parameters. 
The analysis allows the identification of atypical flights 
without limiting the nature of the atypicalities to envision- 
able or pre-defined conditions. 

In summary, the current state of the art is to monitored 
flight data for specified exceedences (excessive speed, 
g-forces, and other easily definable characteristics that differ 
from standard operating procedures). This invention goes 
beyond that by detecting unusual events, statistical patterns, 
and trends without requiring the pre-definition of what to 
look for and without limiting the investigation to a small 
number of parameters. It does this by applying multivariate 
statistical/mathematical methods. 

SUMMARY OF THE INVENTION 

These needs are met by the invention, which provides an 
approach: (1) to provide a set of time varying flight param- 
eters that are “relevant;” (2) to transform this set of flight 
parameters into a minimal orthogonal set of transformed 
flight parameters; (3) to analyze values of each of these 
transformed flight parameters within a time interval associ- 
ated with the flight phase; (4) to apply these analyses to the 
data for each aircraft flight; and (5) to identify flights in 
which the multivariate nature of these transformed flight 
parameters is atypical, according to a consistently applied 
procedure. 

Digital flight data are passed through a series of process- 
ing steps to convert the massive quantities of raw data, 
collected during routine flight operations, into useful infor- 
mation such as that described above. The raw data are 
progressively reduced using both deterministic and statisti- 
cal methods. In the final stages of processing, statistical 
methods are used to identify flights to be reviewed by 
aviation experts, who infer key safety and operational infor- 
mation about the flights described in the data. These flight 
data processing methods are imbedded in software. 

The analysis begins with a selected subset of relevant 
flight parameters, each of which is believed to potentially 
characterize the nature of a selected aircraft’s flight (q), for 
a selected phase (ph) of the flight (e.g., pre-takeoff taxi, 
pre-takeoff position, takeoff, low altitude ascent, high alti- 
tude ascent, cruise, high altitude descent, low altitude 
descent, runway approach, touchdown and post-touchdown 
taxi.). Application of this criterion often reduces the number 
of flight parameters from a few thousand to a number as low 
as about 100, or lower if desired, referred to herein as 
underlying flight parameters (“FPs”). The data value for 
each record and for each FP is inspected to determine if the 
data are reasonable and should be used to characterize the 
nature of the aircraft’s flight or if it is “bad” data that has 
been corrupted. If the data value is deemed “bad” then it is 
removed from the analysis process for those records that it 
is deemed bad. 

The (remaining) sequence of received FP values is ana- 
lyzed separately for parameters that are interval ratio con- 
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tinuous numbers and for parameters that are ordinal or 
categorical parameters, sometimes referred to as discrete 
value parameters. A continuous value parameter value is 
approximated in each of a sequence of overlapping time 
intervals as a polynomial (e.g., quadratic or cubic), plus an 5 
error term. Each of the sequence of approximation coeffi- 
cients for the sequence of time intervals is characterized by 
a first order statistic, a second order statistic, a minimum 
value and a maximum value, and, optionally, by at least one 
of a beginning value and an ending value for the sequence. 10 
The discrete value parameters are analyzed and character- 
ized in terms of proportion of time at each discrete value and 
number of transitions between discrete values. The continu- 
ous value and discrete value characterization parameters are 
combined as an Mxl vector E for each flight. The set of is 
flights is combined to form a matrix for which a covariance 
matrix F is computed. 

An eigenvalue equation, F-V(L)=LV(X), is solved. The 
data matrix formed by combining the Mxl vectors E for the 
set of flights is transformed by a data matrix to form a new 20 
matrix G. The set of all eigenvalues can be, and preferably 
will be, replaced by a reduced set of eigenvalues having the 
largest values. 

A cluster analysis is performed on the new matrix G, with 
each flight being assigned to one of the clusters. The 25 
Mahalanobis distance for the flight with respect to the mean 
of all the flights (based on the G matrix) forms an estimate 
of the atypicality score for each flight, q, in each phase, ph. 
This atypicality score for flight q and phase ph is combined 
with the proportion of flights in the cluster flight q/phase ph 30 
was associated to calculate a new atypicality value, referred 
to as a Global Atypicality Score (GAS). 

The Global Atypicality Scores for all the flights are 
ranked in decreasing order. The flights in the top portion 
(typically 5%) are labeled "atypical” ("Level 2” and "Level 
3”) and the most atypical of these flights are identified as 
"Level 3”. These flights are brought to the user’s attention in 
a list. The user can select any of these flights and drill down 
to get additional information about the flight, including 
comparison of its parameter values to the values of other 4 
flights. 

BRIEF DESCRIPTION OF THE DRAWINGS 

... . . 45 

FIG. 1 is a histogram of a representative group of flights, 

illustrating the appearance of two statistical outliers for 
fictitious flights. 

FIG. 2 illustrates a dendogram display of hierarchical 
clustering. 50 

FIG. 3 is a flow chart of a procedure for practicing an 
embodiment of the invention. 

FIG. 4 is a schematic view of a system for practicing the 
invention. 

55 

DESCRIPTION OF BEST MODES OF THE 
INVENTION 

A sequence of values for each of a selected set of P 
relevant flight parameters FP is received, and unacceptable 60 
values are removed according to one or more of the follow- 
ing: (1) each value u ;l of a sequence is compared with a range 
of acceptable values, U1 =u = U2, and if the parameter value 
u ;l lies outside this range, this value is removed from the 
received sequence; and (2) a first difference of two consecu- 65 
tive values, u M _ 1 , and u„, is compared with a range of 
acceptable first differences, A 1 Ul=u„-u„_ 1 = A 1 U2, and if 


the computed first difference lies outside this range, at least 
one of the values, u M _ 1 , and un, is removed from the received 
sequence. 

For continuous value parameters, each such parameter is 
analyzed by applying a time-based function over each of a 
sequence of partly overlapping time intervals (t fl0 , Lo^^) 
of substantially constant temporal length (N values) to 
develop, for each such time interval and for each FP, a 
polynomial approximation in a time variable t, plus an error 
coefficient. For example, the polynomial may be a quadratic 
sum, such as 

p(nOWt;app)~p 0 


(rio)+Pi(><o)it-t„o ) 2 


+e(n0) 


(1A) 


N+nO-l (IB) 

d(n0) = (N-3r 1 Yj e 

n=nO 


including an error coefficient e(n0) that (i) is minimized for 
each time interval, t„ 0 ^t^t^^.j, by appropriate choice of 
the coefficients p 0 , pj and P 2 and (ii) reflects how closely the 
actual FP data are approximated by the corresponding time 
dependent polynomial for the corresponding time interval. 

For the sequence of time intervals in the selected phase for 
the selected FP, each of the sequence of coefficients 
W n °)}„o> {pi( n0 )Lo> {p 2 ( n °)}„o and {d(nO)}„ 0 , consid- 
ered as a vector v of entries, is characterized by character- 
ization parameters, which include a first order statistic ml(v) 
(e.g., weighted mean, weighted median, mode), by a second 
order statistic m2(v) (e.g., standard deviation), by a mini- 
mum value min(v), by a maximum value max(v), and 
optionally by a beginning value begin(v) and/or by an 
ending value end(v) for that coefficient sequence. The col- 
lection of these characterization parameters is formatted and 
stored as an Mxl vector El, representing the collection of 
time intervals for that phase (ph) for that flight parameter for 
that flight (q). 

Each ordinal or categorical parameter (sometimes 
referred to as a discrete-valued parameter), numbered 
k2=l, . . . , K2 and having L(k2) discrete states, is analyzed 
by forming a square transition matrix, with each row and 
each column representing each of the possible states or 
values of the parameter(s). Each data point from the full 
flight phase is processed by counting the number of transi- 
tions N, , +1 from a state S,- on record i to an immediately 
subsequent state S i+1 on record i+1, including the number of 
transitions of a state to itself. Each diagonal entry in this 
transition matrix is divided by the sum of the original 
diagonal values, to convert the matrix to an L(k2) 2 xl vector 
E k , where L(k2) is the number of distinct values for this 
parameter, k2. The set of vectors E2^ for all the discrete 
parameters of the phase for this flight are concatenated into 
a vector E2, that is Lxl, where L is the sum of L(k2) 2 over 
all k2=l, . . . , K2. 

The discrete parameter vector(s) for each phase and for 
the phase ph is/are combined with the MjXl vector El for 
continuous value parameters to form an Mxl row vector E 
(M=M1+L) that includes the contributions of continuous 
and discrete value parameters. The E vectors from each of 
the Q flights in the set selected to be studied are combined 
to form a matrix, denoted as DM. Optionally, vectors E for 
adjacent phases can be combined to perform a multiple 
phase analysis, if desired. 
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An MxM covariance matrix 


T(x) is an incomplete gamma function. 


F=cov(F) (2) 

is formed, which is symmetric and non-negative definite, 
and an eigenvalue equation 

FV(k)=kV(k) (3) 


is solved to determine a sequence of M=M 1 +L eigenvalues 
\ i with The eigenvalue equation (3) can be 1Q 

solved in a straightforward manner, or a singular value 
decomposition (SVD) approach can be used, as described by 
Kennedy and Gentle in Statistical Computing, Marcel Dek- 
ker, Inc., 1980 pp 278-286, or in any other suitable numeri- 
cal analysis treatment. (The method used is equivalent to 
what is known as principle component analysis.) One works 15 
with a selected subset {X',} of these eigenvalues, which may 
be a proper subset of M' eigenvalues (M' = M), where 


M' M (4) 20 

i= i i=i 


and f is a selected fraction satisfying 0<f£l for example, 25 
f=0.8 or 0.9. 

A transformed matrix 

G=DMF (5) 

is then computed. Preferably, the matrix G is normalized by 30 
subtraction of a first order statistic of each column and by 
division of the difference by a second order statistic asso- 
ciated with that column. 

An atypicality score, also referred to as a Mahalanobis 
distance, 


M' 

A q = (1/(M'-3))Z (GFf/X'j 

j= 1 


35 


(6) 

40 


is computed for each flight (q) and each phase (ph). 

The atypicality scores for the selected set of flights can be 
compared using a histogram of reference atypicality scores 45 
for a collection of reference flights. An atypical flight will 
often appear as a statistical outlier, as illustrated in FIG. 1 for 
two fictitious flights "2064” and "1743”. This one dimen- 
sional approach has the advantage of simplicity of interpre- 
tation. 50 

A p-value, corresponding to an atypicality score A q , the 
selected flight q and the selected phase ph, is defined using 
the Wishart probability density distribution as defined in 
Anderson , An Introduction to Multivariate Statistical Analy- 
sis, 2 nd Edition , John Wiley & Sons, 1984, pg 244-255. 


p(q:pli)=(Fl-F2)/(F3-FA-FS) (7A) 

where 

F1=\A/ r ~ m - 1 '> (7B) 

F2=exp(-(‘/2) trace(S-M,)) (7C) 

F3= (7D) 
F4=KI 1/2 *, (7E) 

F5=n" i=i r((‘/4) (F+l-;)) (7F) 


60 


65 


A cluster analysis is applied to a collection of observed 
values G (from Eq. (5)) for the same phase and for the full 
set of selected flight(s). A preferred cluster analysis is 
K-means analysis, as set forth in any of a number of statistics 
and data mining books, including Kennedy, Lee, Roy, Reed 
and Lippman, Solving Data Mining Problems Through Pat- 
tern Recognition , Prentice Hall PTR, 1995-1997, page 
10-50 through 10-53. The clustering is performed for each 
phase (or aggregated group of phases) separately. 

The initialization step requires selection of the number K 
of clusters, and the setting of the initial seed values. There 
are a number of ways to set these seeds; including using (i) 
a random selection of K flight vectors U from the full set of 
flight vectors, (ii) a random selection of dimension values 
for each of the K flight vectors, (iii) setting the seeds to be 
all zeros in all dimension but one and that value is a 
maximum or minimum of that value among all flight vec- 
tors. There are many other ways as well. The first method is 
a preferred method. These seeds take the role as the initial 
values of the cluster centers or centroids. 

The next step requires that the distance from each cluster 
centroid to each flight vector is calculated. A flight vector is 
associated with the cluster that has the minimum flight 
vector-to-center distance. There are numerous methods to 
calculate distance, including Euclidian distance, Manhattan 
distance and cosine methods. A preferred method is the 
Euclidean distance. 

After associating every flight vector U with a cluster, the 
centroid for each cluster k is calculated as the mean or first 
order statistic in each dimension of the flight vectors that are 
associated with cluster k. 

These last two steps are repeated until the number of flight 
vectors changing cluster membership is below some thresh- 
old or an upper limit of number of iterations is reached. 

A second preferred cluster analysis method is hierarchical 
clustering, which works with partitions of the collection of 
observations that are built up (agglomerations) or that are 
divided more finely (divisions) at each stage. Hierarchical 
methods are discussed by B. S. Everitt, ibid, pp. 55-89. 
Other cluster analysis can also be performed using any of the 
approaches set forth in B. S. Everitt, pp 37-140. 

Hierarchical clustering initially assigns each flight, 
q=l, . . . , Q, to its own cluster, c=l, . . . C. Then the 
“distance” between all possible flight vectors pairs is cal- 
culated using the G matrix and identify the two flight vectors 
with the minimum distance. There are numerous methods to 
calculate distance, including Euclidian distance, Manhattan 
distance and cosine methods. A preferred method is the 
Euclidean distance. These flight vectors are associated with 
a cluster. The cluster’s centroid is calculated based on all its 
members, denoted by cc, 1, . . . , CC. 

After the first cluster is formed, calculate the distance 
between all possible pairs from Q-l objects (Q-2 flight 
vectors and 1 cluster), find the pair with the minimum 
distance and assign them to a cluster. This may be a pair of 
flight vectors or a flight vector with a cluster (and if there are 
multiple clusters, as there inevitably will be, it could be two 
clusters jointed to form one larger cluster). Continue this 
process of calculating distances, finding the minimum dis- 
tance and assigning flights or clusters to form bigger clusters 
until all have been aggregated to one global cluster. 

FIG. 2 illustrates this process graphically in a dendogram. 
The user has the option of how many clusters to use. One 
could choose any number from 2, . . . (Q-l). One could cut 
the dendogram horizontally to form K clusters or at different 
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levels for different clusters. The options commonly used are: 
(1) to specify the number of clusters and cut horizontally, (2) 
to look for long vertical branches in the dendogram and cut 
horizontally at that level, (For FIG. 2 this would result in 10 
clusters.), and (3) to calculate a index of cluster homoge- 
neity as a function of the sum of the squares of within-cluster 
distances and between-cluster distances. A preferred method 
is the first. References to these and other acceptable tech- 
niques can be found in Webb, Andrew. Statistical Pattern 
Recognition. Oxford University Press Inc. New York. 1999. 
pages 308-310. or G. W. Milligan and M. C. Cooper. An 
examination of procedures for determining the number of 
clusters in a data set. Psychometrika, 50(2): 159-179, 1985. 

A cluster membership score CMS(q;ph), equal to a mono- 
tonic function of a ratio, the number of observations in that 
cluster, divided by the total number of observations 
(0<CMS<1), is then computed for the selected flight (q) and 
the selected phase (ph). A larger value of CMS corresponds 
to a less atypical set of observed values for the selected flight 
(q) and the selected phase (ph), and inversely. 

A Global Atypicality Score GAS for a selected flight (q) 
and selected phase (ph) is then defined as 

GAS(q;ph)=-\o% z {p(q;ph)}-\og t {CMS(q;ph)}, ( 8 ) 

where z is a selected real number greater than 1. According 
to the definition in Eq. (8), a Global Atypicality Score GAS 
increases with decreasing p-values and with decreasing 
CMS values. A probability value Pr can be assigned to each 
GAS value that decreases with an increase in the GAS value. 
The logarithm functions in Eq. (8) can be replaced by 
another function Fn that is monotonic in the argument, such 
as 

GAS (q;ph)=wl • Fn{p(q;ph ) } 

+(1 -w)-Fn{CMS(q;ph)}, ( 9 ) 

where w is a number lying in the range 0=w = 1 . 

FIG. 3 is a flow chart of a procedure for practicing the 
invention. In step 1, one or more sequences of flight param- 
eter (FP) values are received for a selected phase (ph) for a 
selected flight (q), for each of a sequence of overlapping 
time intervals, and unacceptable parameter values are iden- 
tified and removed from one or more sequences. 

In step 2, applicable to a parameter with continuous 
values, polynomial coefficients p o (n0), p 1 (n0) and p ; (n0) 
and an error coefficient e(n0) are determined for a polyno- 
mial approximation p(t;app)«p 0 (nO)+p 1 (nO)(t-t„)+p 2 (nO) 
(t-t ;i ) 2 +e(n0), where the coefficients p 0 , p 2 and P 2 are chosen 
to minimize the magnitude of e. The collections of coeffi- 
cients {p 0 (nO)}„ 0 , {p 1 (n 0 )}, i0 , {p 2 (nO)}„ 0 and (d(n0)= 

(N-3) 1 2e(n0) 2 },, are treated as entries for the respective 
vectors v=A, B, C and D, for the selected flight (q) and the 
selected phase (ph). A first order statistic ml(v), a second 
order statistic m2(v), a minimum value min(v) and a maxi- 
mum value max(v), and optionally at least one of a begin- 
ning value begin(v) and an ending value end(v), are com- 
puted for each of the vectors v=A, B, C and D. An Mlxl 
vector El is formed, including the entries of the vectors A, 
B, C and D. 

In step 3, for each of the overlapping time intervals, an 
L(k2)xL(k2) matrix is formed whose entries are the number 
of transitions from one of L(k2) discrete values to another of 
these discrete values of an FP; each of the original diagonal 
values of the L(k2)xL(k2) matrix is divided by the sum of 
the original diagonal values so that the sum of the diagonal 
entries of this modified L(k2)xL(k2) matrix has the value 1. 
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An Lxl vector E2 is formed from the entries of the modified 
L(k2)xL(k2) matrices, where L is the sum of the squares 
L(k2) 2 . 

In step 4, an Mxl vector E, including the entries of the 
5 vectors El and E2, is formed, where M=M1+L. In step 5, an 
MxM covariance matrix F=cov(E) is computed. 

In step 6, eigenvalues k for an eigenvalue equation, 
FV(X)=XV(X), are obtained, where kl=k2= . . . SXMgO, 
and a selected subset of these eigenvalues, 
10 k'l^k'2^ . . . X'M'gO, is provided, where M'^M. 

In step 7, a transformed matrix G=DMF is provided, 
where DM is a selected data matrix. 

In step 8, an atypicality score, Aq is calculated based on 
the M' variables for the selected set of flights and the 
IS selected phase (ph), as set forth in Eq. (6). 

In step 9 (optional), the computed atypicality score, A , 
for the selected flight is compared with a reference histo- 
gram of corresponding atypicality scores for a reference 
collection of similar flights with the same phase (ph), and an 
20 estimate is provided of a probability associated with the 
computed atypicality score relative to the reference collec- 
tion. Step 9 is a simplified alternative to cluster analysis, 
which is covered in steps 10-15. 

In step 10, a p-value corresponding to the computed 
25 atypicality score is provided for the selected flight and/or for 
one or more similar flights with the same phase (ph), as 
determined by A q . 

In step 11, an initial collection of M'-dimensional clusters 
is provided for the atypicality scores, A q . 

.to In step 12, a selected cluster analysis, such as K-means 
analysis or hierarchical analysis, is performed for the cluster 
collection provided. Each atypicality score is assigned to 
one of the clusters, and a selected cluster metric value or 
index is computed. 

35 In step 13, membership in the clusters is iterated upon to 
determine a substantially optimum cluster collection that 
provides an extremum value (minimum or maximum) for 
the selected cluster metric value or index. 

In step 14, a cluster membership score (CMS) is com- 
40 puted for each cluster, equal to a monotonic function of a 
ratio, the number of observations (atypicality scores) asso- 
ciated with each cluster, divided by the total number of 
observations in all the clusters. 

In step 15, a global atypicality score GAS is computed as 
45 a — a linear combination of a selected monotonic function Fn 
applied to the p-value and the selected function Fn applied 
to the CMS, for the selected flight(s) and the selected phase 
(ph). 

FIG. 4 is a schematic view of a computer system 30 for 
50 practicing the invention. The sampled values (continuous 
and/or discrete) are received at an input terminal of an 
acceptance module 31 that performs step 1 (FIG. 3 ) and 
determines which sampled values are acceptable. The 
acceptable values are presented to a matrix analysis module 
55 32 , which (i) distinguishes between continuous and discrete 
parameter values and (ii) performs the polynomial approxi- 
mation analysis and statistical analysis and (iii) forms the 
vectors El, E2 and E, as in steps 2, 3 and 4. The vector E 
is received at a covariance calculation module 33 , which 
60 generates and issues the matrix F=cov(E), as in step 5. The 
matrix F is received by an eigenvalue analyzer 34 , which 
solves the eigenvalue equation, F-V(X)=XV(X) and stores the 
eigenvalues X=X1, . . . , XM, as in step 6. Optionally, the 
eigenvalue analyzer 34 identifies a selected subset of M' 
65 eigenvalues. A transformed matrix G=DMF is formed in a 
matrix transformation module 35 , as in step 7, where DM is 
a matrix of selected FP values. The eigenvalues X'i and the 
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entries of the transformed matrix G are received by an 
atypicality calculator 36, which calculates an atypicality 
score or flight signature, as in step 8. The atypicality score 
is optionally analyzed by a histogram comparator module 
37, as in step 9. 

A collection of one or more atypicality scores is received 
by a p-value module 38, which calculates a p-value for the 
collection, as in step 10 (FIG. 3). A cluster analysis module 
39 receives the G matrix and determines an optimal assign- 
ment of each flight vector to one of K clusters. A cluster 
membership score (CMS) is computed by a CMS module 
40 , as in step 14. A GAS module 41 receives the p-value 
score(s) and the CMS score(s) and computes a global 
atypicality score (GAS), as in step 15. 

A GAS value for a selected flight (q) and selected phase(s) 
(ph) may be compared with a spectrum of GAS values for 
a collection of reference flights for the same phase(s) to 
estimate a probability associated with the GAS for the 
selected flight. A GAS value for a selected flight may, for 
example, be placed in the most atypical 1 percent of all 
flights, in the next 4 percent of all flights, in the next 16 
percent of all flights, or in the more typical remaining 80 
percent of all flights. 

Assume that the selected flight atypicality score is 
assigned to a given cluster, SFC. The GAS value for that 
selected flight will decrease as the CMS for the cluster SFC 
increases, and inversely. An increased CMS value for the 
SFC corresponds to enlargement of the SFC. The logarithm 
function -log,(x) manifests increased sensitivity to change 
of the argument x as x approaches 0. 

What is claimed is: 

1 . A method for analyzing aircraft flight data, the method 
comprising: 

(i) receiving flight data for measurements of each of P 
selected parameters {m(t;k;q)| (k=l, . . . , P) at each of 
N selected times (t=t„) (n=n0, . . . , nO+N-1; N§;2) for 
one or more selected flights (q) of one or more aircraft; 

(ii) for each continuous-valued parameter p(t;kl) of each 
flight, numbered kl=l, . . . , K1 (K1S0), and for a 
selected sequence of the times t=t„ (n=n 0 , nO+1, . . . , 
n=nO+N-l, providing a polynomial approximation p(t; 
kl; app)= a (t„ 0 ;kl)+b (t, l0 ;kl)-(t-t„ 0 )+c(t„ 0 ;kl).(t-t„ 0 ) 
2 +e(f, l0 ;kl), where e (hiO ;kl) is an error term, whose sum 
of the squares d(t, l0 ;kl)=(N-3)' 1 *2e(t, i ;kl) 2 , is mini- 
mized by the choice of the terms a(t„ 0 ;kl), b (t„ 0 ; kl) 
and c(t„ 0 ;kl); 

(iii) forming vectors A={ a (Wkl)} B={b(t, l0 ;kl)} 
C={c(t„ 0 ;kl)}„ 0 , and D={d(t„ 0 ;kl)}„ 0 , forming an 
Mlxl vector El including a first order statistic ml(v), 
a second order statistic m2(v), a minimum value min(v) 
and a maximum value max(v) for each of the vectors 
v=A, v=B, v=C and v=D; 

(iv) for each discrete-valued parameter, numbered 
k2=l . . . , K2 (K2=0) and having L(k2) discrete 
values, and for the selected sequence of times, forming 
an L(k2)xL(k2) matrix whose entries are the number of 
transitions between any two of the L(k2) discrete 
values of this parameter, dividing each of the original 
diagonal entries by a sum of the original diagonal 
entries of the L(k2)xL(k2) matrix to form a modified 
L(k2)xL(k2) matrix, and forming an Lxl vector E2 of 
entries from the modified L(k2)xL(k2) matrices, where 
L is the sum of the values L(k2) 2 ; 

(v) forming an Mxl data vector E with entries including 
ml(v), m2(v), min(v) and max(v) for each of the 
vectors v=A, v=B, v=C and v=D, and including the 
entries of the modified Lxl vector, where M=M1+L; 
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(vi) computing a covariance matrix F=cov(E); 

(vii) computing eigenvalues, X=X1, X2, . . . , XM, for an 
equation F-V(X)= XV(X), where X1=X2 = . . . =XM; 
and 

5 (viii) computing a transformed matrix G=DMF, where 
DM is a selected data matrix. 

2. The method of claim 1, further comprising: 

providing at least one sub-sequence of at least one of said 

values m(t„;klq), and computing a selected linear com- 

1° bination of one or more of said values m(t„;klq) in the 
sub-sequence; 

comparing the computed linear combination of said val- 
ues with a reference range of values for the computed 
linear combination; and 

15 when the computed linear combination of said values 
does not lie within the reference range, interpreting this 
condition as indicating that at least one of said param- 
eter values in the sub-sequence is unacceptable. 

3. The method of claim 2, further comprising: 

20 when said computed linear combination of said values lies 
within said reference range, interpreting this condition 
as indicating that said values in said sub-sequence are 
acceptable. 

4 . The method of claim 1 , further comprising computing 
an atypicality score A q , defined as 

M' 

/\,=(1/(M'-3))V (G„9 2 MJ, 

.50 

where G qj is an entry in said matrix G and 
{XT, X2, . . . , XM'} is a selected subset of said eigenvalues 

35 {XI, X2, . . . , X'M'}, with M’iM. 

5 . The method of claim 4 , further comprising comparing 
said computed atypicality score A (? with a histogram of 
reference atypicality scores for said selected phase for a 
collection of at least one reference flight. 

40 6 . The method of claim 4 , further comprising: 

when said atypicality score A q is greater than a selected 
percentage, PCT, of all atypicality scores in said his- 
togram, interpreting this condition as indicating that a 
selected phase (ph) for said selected flight is atypical, 

45 as compared to a percentage of said reference atypi- 
cality scores, where PCT is a selected number at least 
equal to 80 percent. 

7. The method of claim 6, further comprising choosing 
said selected percentage PCT from a group of percentages 

50 consisting of 80 percent, 90 percent, 95 percent and 99 
percent. 

8. The method of claim 6, further comprising selecting 
said phase of said selected flight from among the phases 
pre-takeoff taxi, pre-takeoff position, takeoff, low altitude 

55 ascent, high altitude ascent, cruise, high altitude descent, 
low altitude descent, runway approach, touchdown and 
post-touchdown taxi. 

9 . The method of claim 4 , further comprising computing 
a p-value associated with said atypicality score A ? , defined 

60 as 

p(q;ph)=Fl-F2/(F3F4FS), 

Fl=lA q f R - M ~ 1 ) 

66 F2=exp(-( 1 /i) trace(2 _1 A q )) 

F3=2-" R *jt"< M - 1 >' 4 
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F4=E! 1/2 * 

F5=n" fel r{(i/2)(R+i-0}, 

where r(x) is an incomplete gamma function. 

10 . The method of claim 9 , further comprising: 5 

assigning each of a group of observation vectors U, whose 

entries are drawn from entries of said transformed 
matrix G, to one of two or more clusters, using a 
selected cluster analysis procedure; 

for each modified cluster, providing a cluster membership to 
score CMS(q;ph) that is a strictly monotonic function 
of the number of observation vectors U in the cluster 
divided by the total number of observation vectors in 
all clusters; and 

computing a global atypicality score, GAS, defined as 15 
GAS(q;ph)=w*Fn{p(q;ph)}+(l-w)*Fn{CMS(q;ph)}, 
where Fn is a selected monotonic function and w is 
a selected weight lying between 0 and 1. 

11 . The method of claim 10 , further comprising selecting 
said monotonic function Fn to be Fn{s}=-log,{s}, where z 20 
is a selected number greater than 1. 

12 . The method of claim 10 , wherein said selected cluster 
analysis procedure comprises: 

(1) providing an initial set of at least two clusters 

(2) providing a cluster centroid for each cluster; 25 

(3) assigning each of said group of observation vectors U, 

whose entries are drawn from entries of said trans- 
formed matrix G, to the cluster for which a distance 
from the centroid to said vector U is a minimum among 
all centroids; 50 

(4) computing a modified centroid for each cluster from 
said vectors U assigned to the cluster; 

(5) assigning each of said vectors U to a modified cluster 
associated with the modified centroid for which the 
distance from the modified centroid to said vector U is 55 
a minimum among the distance for all modified cen- 
troids; 

(6) repeating steps 3 , 4 and 5 until at least one of two 
conditions is met: (i) the number of iterations is greater 
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that a maximum allowed number of iterations, or (ii) 
the number of flights that change cluster membership 
between iterations is below a selected threshold; and 

(7) for each modified cluster, providing said cluster mem- 
bership score CMS(q;ph). 

13 . The method of claim 10 , further comprising: 
comparing said computed GAS for said computed atypi- 
cality score A q with GAS scores for at least first, second 
and third atypicality scores A q ; and 

estimating a level of atypicality for the first computed 
atypicality, based upon number of GAS that are less 
than the first computed GAS and number of GAS that 
are greater than the first computed GAS. 

14 . The method of claim 10 , further comprising: 
when said computed GAS for said computed atypicality 

score A q lies in a selected atypicality range, interpreting 
this condition as indicating that said flight parameter 
values for at least one phase ph for said flight number 
q are atypical. 

15 . The method of claim 10 , further comprising: 
when said computed GAS for said computed atypicality 

score A q does not lie in a selected atypicality range, 
interpreting this condition as indicating that at least one 
of said flight parameter values for at least one phase ph 
for said flight number q is not atypical. 

16 . The method of claim 10 , wherein said selected cluster 
analysis procedure comprises a hierarchical cluster analysis 
procedure. 

17 . The method of claim 1 , further comprising: 
including in said vector El at least one of: (i) a sequence 

of beginning values, denoted begin(v), for each of said 
vectors v=A, v=B, v=C and v=D, and (ii) a sequence of 
ending values, denoted end(v), for each of said vectors 
v= A, v=B, v=C and v=D. 



